generated in 2017-05-08 14:29:18

Project_id: CA-PM-20170506_p1p4

1 同步与拆分

1.1 raw date: 170506_NB501128_0084_AH7N2VBGX2

sync time

  • Start: Mon May 8 09:20:01 CST 2017

  • Finish: Mon May 8 11:03:01 CST 2017

  • Time: 1.71667h

  • data_size: 98.76G ; speed: 15.98M/s

split time

  • Start: Mon May 8 11:02:09 CST 2017

  • Finish: Mon May 8 13:19:02 CST 2017

  • Time: 2.28139h

split data

  • unmatched 48119.9 Mb

  • matched 105634 Mb

  • total 153754 Mb

  • low_data samples 0

1.2 raw date: 170502_NB501128_0082_AHTLN7BGX2

sync time

  • Start: Thu May 4 05:40:01 CST 2017

  • Finish: Thu May 4 06:54:53 CST 2017

  • Time: 1.24778h

  • data_size: 66.79G ; speed: 14.87M/s

split time

  • Start: Thu May 4 06:56:36 CST 2017

  • Finish: Thu May 4 10:32:29 CST 2017

  • Time: 3.59806h

split data

  • unmatched 36732 Mb

  • matched 93732 Mb

  • total 130464 Mb

  • low_data samples 0

1.3 raw date: 170504_NB501128_0083_AHTV5NBGX2

sync time

  • Start: Sat May 6 02:20:01 CST 2017

  • Finish: Sat May 6 03:19:30 CST 2017

  • Time: 0.991389h

  • data_size: 50.86G ; speed: 14.25M/s

split time

  • Start: Sat May 6 03:22:48 CST 2017

  • Finish: Sat May 6 05:27:50 CST 2017

  • Time: 2.08389h

split data

  • unmatched 27080.1 Mb

  • matched 73056.4 Mb

  • total 100137 Mb

  • low_data samples 0

2 分析

2.1 analysis time

  • Round 1: Mon May 8 13:18:35 CST 2017

  • Round 2: Mon May 8 13:17:27 CST 2017

  • Finish: Mon May 8 14:26:23 CST 2017

  • Time: 1.13h

2.2 pass samples:

OG173110002N1LEUD2kx9b2

OG175310093N1LEUD2kx9b2

OG178510071N1LEUD2kx9b2

2.3 reject samples:

  • OG160200008N1LEUD2kx9b2

OG160200008N1LEUD2kx9b2 on capture_depth_1X with 608 outside 1000.0 to 4000

OG160200008N1LEUD2kx9b2 on seq_depth with 1485.83 outside 2000 to 6000

  • OG170250097N1LEUD2kx9b1

OG170250097N1LEUD2kx9b1 on capture_depth_1X with 638 outside 1000.0 to 4000

OG170250097N1LEUD2kx9b1 on %(G+C) with 51.06 outside 40 to 50

OG170250097N1LEUD2kx9b1 on seq_depth with 1435.96 outside 2000 to 6000

  • OG170250097T1HYTD2kx9b2

OG170250097T1HYTD2kx9b2 on capture_depth_1X with 756 outside 1000.0 to 4000

OG170250097T1HYTD2kx9b2 on seq_depth with 1824.8 outside 2000 to 6000

  • OG170250216N1LEUD2kx9b1

OG170250216N1LEUD2kx9b1 on capture_depth_1X with 617 outside 1000.0 to 4000

OG170250216N1LEUD2kx9b1 on %(G+C) with 50.74 outside 40 to 50

OG170250216N1LEUD2kx9b1 on seq_depth with 1131.57 outside 2000 to 6000

  • OG170250216T1FRED2kx9b2

OG170250216T1FRED2kx9b2 on capture_depth_1X with 796 outside 1000.0 to 4000

OG170250216T1FRED2kx9b2 on seq_depth with 1868.89 outside 2000 to 6000

  • OG173710152N1LEUD2kx9b2

OG173710152N1LEUD2kx9b2 on capture_depth_1X with 986 outside 1000.0 to 4000

  • OG174310048N1LEUD2kx9b2

OG174310048N1LEUD2kx9b2 on capture_depth_1X with 739 outside 1000.0 to 4000

OG174310048N1LEUD2kx9b2 on seq_depth with 1763.07 outside 2000 to 6000

  • OG174310053N1LEUD2kx9b2

OG174310053N1LEUD2kx9b2 on capture_depth_1X with 829 outside 1000.0 to 4000

OG174310053N1LEUD2kx9b2 on seq_depth with 1931.95 outside 2000 to 6000

  • OG177910001N1LEUD2kx9b2

OG177910001N1LEUD2kx9b2 on capture_depth_1X with 746 outside 1000.0 to 4000

OG177910001N1LEUD2kx9b2 on seq_depth with 1754.58 outside 2000 to 6000

  • OG178510059N1LEUD2kx9b2

OG178510059N1LEUD2kx9b2 on capture_depth_1X with 912 outside 1000.0 to 4000

2.4 reject samples (loose):

3 最新一批数据质控指标

汇总了现有 CN500 测序仪肿瘤样本测序数据的结果, 并从以下角度和步骤进行总结

质控指标 32 个:

prj_sample, sample, size_Gb, GC, N, Q20, Q30, low_qual_filter, adapter_filter, undersize_ins_filter, duplicated_filter, clean_size_Gb, clean_GC, clean_N, clean_Q20, clean_Q30, coverage, mapping_rate, coverage_cent, specificity_cent, uniformity_cent, panel_dep, samtools_dups, insert_size, seq_dep, trim_adapter, mut_dep, sample_type, panel_type, date, eff_seq, eff_mut

关键指标:

  • seq_dep: 下机数据量 / panel 大小, 即得到的预期深度
  • panel_dep: 捕获到的 panel 内深度
  • eff_seq: panel_dep/seq_dep, 即实验环节的数据利用率
  • mut_dep: 分析时得到的突变位点的平均深度
  • eff_mut: mut_dep/panel_dep, 即分析环节的数据利用率

各环节关注的质控因素:

  • 实验: eff_seq, GC, dups, specificity
  • 分析: eff_mut, dups
  • 稳定性: 样本间, 批次间

最新一批 20170506_p1p4 共有 13 个样本

## 
##   LEU OTHER 
##    11     2

3.1 所有质控指标一览图

3.2 各指标各自展示图

3.3 各指标数据统计

##                                     prj_sample                  sample
## 1  CA-PM-20170506_p1p4/OG160200008N1LEUD2kx9b2 OG160200008N1LEUD2kx9b2
## 2  CA-PM-20170506_p1p4/OG170250097N1LEUD2kx9b1 OG170250097N1LEUD2kx9b1
## 3  CA-PM-20170506_p1p4/OG170250097T1HYTD2kx9b2 OG170250097T1HYTD2kx9b2
## 4  CA-PM-20170506_p1p4/OG170250216N1LEUD2kx9b1 OG170250216N1LEUD2kx9b1
## 5  CA-PM-20170506_p1p4/OG170250216T1FRED2kx9b2 OG170250216T1FRED2kx9b2
## 6  CA-PM-20170506_p1p4/OG173110002N1LEUD2kx9b2 OG173110002N1LEUD2kx9b2
## 7  CA-PM-20170506_p1p4/OG173710152N1LEUD2kx9b2 OG173710152N1LEUD2kx9b2
## 8  CA-PM-20170506_p1p4/OG174310048N1LEUD2kx9b2 OG174310048N1LEUD2kx9b2
## 9  CA-PM-20170506_p1p4/OG174310053N1LEUD2kx9b2 OG174310053N1LEUD2kx9b2
## 10 CA-PM-20170506_p1p4/OG175310093N1LEUD2kx9b2 OG175310093N1LEUD2kx9b2
## 11 CA-PM-20170506_p1p4/OG177910001N1LEUD2kx9b2 OG177910001N1LEUD2kx9b2
## 12 CA-PM-20170506_p1p4/OG178510059N1LEUD2kx9b2 OG178510059N1LEUD2kx9b2
## 13 CA-PM-20170506_p1p4/OG178510071N1LEUD2kx9b2 OG178510071N1LEUD2kx9b2
##    size_Gb     GC     N    Q20    Q30 low_qual_filter adapter_filter
## 1   0.0827 49.890 0.115 89.150 82.990         7.12940       0.456703
## 2   0.0799 51.060 0.100 95.200 92.045         3.31005       0.207781
## 3   0.1016 46.645 0.105 89.560 83.345         6.61774       0.283518
## 4   0.0630 50.740 0.140 95.955 92.870         3.00972       0.141257
## 5   0.1040 49.925 0.105 90.010 84.135         6.00480       0.389828
## 6   0.1332 48.775 0.110 90.175 84.285         6.13978       0.444264
## 7   0.1286 49.805 0.115 90.145 84.315         6.04230       0.367861
## 8   0.0981 48.170 0.105 90.090 84.120         6.19061       0.310002
## 9   0.1075 46.345 0.110 89.875 83.755         6.53646       0.282496
## 10  0.1406 47.750 0.110 89.815 83.730         6.65286       0.319571
## 11  0.0977 44.870 0.115 90.295 84.290         6.22008       0.251958
## 12  0.1211 49.225 0.105 90.420 84.690         5.80633       0.294168
## 13  0.1426 46.780 0.100 90.280 84.390         5.79150       0.299274
##    undersize_ins_filter duplicated_filter clean_size_Gb clean_GC clean_N
## 1                     0           4.53609        0.0726   50.065   False
## 2                     0          10.93310        0.0684   51.065   False
## 3                     0           4.18363        0.0903   46.805   False
## 4                     0           7.48346        0.0562   50.800   False
## 5               48.2164           2.95642        0.0411   51.060   False
## 6                     0           5.85512        0.1165   48.905   False
## 7                     0           5.80904        0.1128   49.950   False
## 8                     0           5.27317        0.0865   48.325   False
## 9                     0           5.88394        0.0938   46.570   False
## 10                    0           5.63580        0.1228   47.935   False
## 11                    0           6.22530        0.0852   45.060   False
## 12                    0           5.79539        0.1066   49.375   False
## 13                    0           5.86790        0.1255   46.915   False
##    clean_Q20 clean_Q30 coverage mapping_rate coverage_cent
## 1     91.055    85.140   0.0025       0.9979        0.9609
## 2     96.115    93.055   0.0029       0.9987        0.9267
## 3     91.265    85.275   0.0029       0.9982        0.9996
## 4     96.865    93.895   0.0010       0.9983        0.9359
## 5     91.485    86.075   0.0027       0.9981        0.9160
## 6     91.755    86.060   0.0026       0.9980        0.9723
## 7     91.720    86.080   0.0025       0.9981        0.9555
## 8     91.680    85.910   0.0027       0.9983        0.9817
## 9     91.525    85.620   0.0021       0.9982        0.9997
## 10    91.510    85.645   0.0038       0.9980        0.9887
## 11    91.860    86.055   0.0020       0.9982        0.9996
## 12    91.950    86.400   0.0025       0.9983        0.9697
## 13    91.755    86.040   0.0027       0.9983        0.9954
##    specificity_cent uniformity_cent panel_dep samtools_dups insert_size
## 1            0.6059          0.9979       608        0.1448       261.0
## 2            0.6633          0.9963       638        0.1640       181.9
## 3            0.5857          0.9999       756        0.1341       187.3
## 4            0.6649          0.9967       617        0.1232       188.1
## 5            0.6146          0.9967       796        0.1526       172.9
## 6            0.5855          0.9985      1021        0.1642       180.5
## 7            0.5855          0.9977       986        0.1640       179.3
## 8            0.6049          0.9989       739        0.1497       177.2
## 9            0.5838          0.9999       829        0.1579       180.7
## 10           0.5969          0.9993      1062        0.1597       183.0
## 11           0.5741          0.9999       746        0.1600       183.9
## 12           0.5783          0.9979       912        0.1569       180.4
## 13           0.5850          0.9996      1105        0.1645       181.9
##    seq_dep trim_adapter mut_dep sample_type panel_type          date
## 1  1485.83    0.0622592     464         LEU       p1p4 20170506_p1p4
## 2  1435.96    0.0614643      NA         LEU       p1p4 20170506_p1p4
## 3  1824.80    0.0521457    1332       OTHER       p1p4 20170506_p1p4
## 4  1131.57    0.0522369      NA         LEU       p1p4 20170506_p1p4
## 5  1868.89    0.0700785    1347       OTHER       p1p4 20170506_p1p4
## 6  2391.36    0.0593041     750         LEU       p1p4 20170506_p1p4
## 7  2309.67    0.0602870     747         LEU       p1p4 20170506_p1p4
## 8  1763.07    0.0651162     549         LEU       p1p4 20170506_p1p4
## 9  1931.95    0.0603952     561         LEU       p1p4 20170506_p1p4
## 10 2524.71    0.0540815     756         LEU       p1p4 20170506_p1p4
## 11 1754.58    0.0558601     538         LEU       p1p4 20170506_p1p4
## 12 2175.44    0.0603850     765         LEU       p1p4 20170506_p1p4
## 13 2562.49    0.0586892     726         LEU       p1p4 20170506_p1p4
##      eff_seq   eff_mut
## 1  0.4091989 0.7631579
## 2  0.4443021        NA
## 3  0.4142920 1.7619048
## 4  0.5452601        NA
## 5  0.4259213 1.6922111
## 6  0.4269537 0.7345739
## 7  0.4269008 0.7576065
## 8  0.4191552 0.7428958
## 9  0.4291001 0.6767189
## 10 0.4206424 0.7118644
## 11 0.4251730 0.7211796
## 12 0.4192255 0.8388158
## 13 0.4312212 0.6570136

4 最近5批数据质量信息

4.1 基本信息

最近5批数据 20170506_p1p4, 20170504_p1p4, 20170504_p1p2p4, 20170502_p1p4, 20170502_p1p2p4

总计 179 个样本

按样本类型, panel 类型, 日期批次进行计数

table(dd$sample_type)
## 
##    CF  FFPE   LEU OTHER 
##    39     3   130     7
table(dd$panel_type)
## 
## p1p2p4   p1p4 
##     25    154
table(dd$date)
## 
## 20170502_p1p2p4   20170502_p1p4 20170504_p1p2p4   20170504_p1p4 
##              15              89              10              52 
##   20170506_p1p4 
##              13

4.1.1 数据量过少的样本

thre_low <- 0.01
date_index <- first_index(dd$date)
ggplot(dd) + geom_point(aes(x = seq(nrow(dd)), y = log10(size_Gb * 1000), color = group_as_two(date))) + geom_hline(yintercept = log10(thre_low * 1000), color = 'red') + geom_text(aes(x = nrow(dd) / 2, label = paste("threshold:", thre_low, "Gb"), y = log10(thre_low * 1000) - 0.1)) + labs(title = 'low data size') + annotate("text", x = date_index, y = rep(thre_low, length(date_index)), label = dd$date[date_index], angle = 90, alpha = 0.5) + theme(legend.position = "none")

dd[which(dd$size_Gb < thre_low), c('prj_sample', 'size_Gb')]
## [1] prj_sample size_Gb   
## <0 rows> (or 0-length row.names)
dd <- dd[which(dd$size_Gb > thre_low), ]

去除低数据量样本 (下机数据量低于 0.01 Gb) 后得到 179 个样本

去除低数据量样本后的分类统计:

table(dd$sample_type)
## 
##    CF  FFPE   LEU OTHER 
##    39     3   130     7
table(dd$panel_type)
## 
## p1p2p4   p1p4 
##     25    154
table(dd$date)
## 
## 20170502_p1p2p4   20170502_p1p4 20170504_p1p2p4   20170504_p1p4 
##              15              89              10              52 
##   20170506_p1p4 
##              13

4.1.2 数据量

dd$expect_size <- c(unlist(sapply(dd$sample_type, expect_size)))
ggplot(dd) + geom_boxplot(aes(date, size_Gb), alpha = 0.3) + geom_violin(aes(date, size_Gb), alpha = 0.6) + geom_jitter(aes(date,  size_Gb)) +geom_hline(aes(yintercept=0.3*expect_size),color="red",alpha=0.6) + geom_hline(aes(yintercept=2*expect_size),color="red",alpha=0.6) + labs(title = 'data size by sample_type') + theme(legend.position = "none", axis.text.x=element_text(angle=20)) + facet_wrap(~sample_type, scales = 'free')
## Warning in max(data$density): no non-missing arguments to max; returning -
## Inf

## Warning in max(data$density): no non-missing arguments to max; returning -
## Inf

#ggplot(dd) + geom_boxplot(aes(sample_type, size_Gb), alpha = 0.3) + geom_violin(aes(sample_type, size_Gb), alpha = 0.6) + geom_jitter(aes(sample_type,  size_Gb)) + labs(title = 'data size by date') + theme(legend.position = "none") + facet_wrap(~date, scales = 'free')

下机数据量的批次间稳定性

4.2 实验环节

4.2.1 测序数据有效利用率

thre_eff_seq <- 0.33
ggplot(dd) + geom_boxplot(aes(sample_type, eff_seq), alpha = 0.3) + geom_violin(aes(sample_type, eff_seq), alpha = 0.6) + geom_jitter(aes(sample_type,  eff_seq)) + geom_hline(yintercept = thre_eff_seq, color = 'red') + labs(title = 'panel_dep/seq_dep by sample_type')

ggplot(dd) + geom_boxplot(aes(sample_type, eff_seq), alpha = 0.3) + geom_violin(aes(sample_type, eff_seq), alpha = 0.6) + geom_jitter(aes(sample_type,  eff_seq)) + geom_hline(yintercept = thre_eff_seq, color = 'red') + labs(title = 'panel_dep/seq_dep by sample_type by date') + facet_wrap(~date, scales = 'free_y')

下机数据量利用率的稳定性: mean: 0.3840941, SD: 0.1576461; 其中 CFDNA 的 mean: 0.3924713, SD: 0.1247553

4.2.2 测序深度与测序数据利用率之间的关系

ggplot(dd) + geom_point(aes(seq_dep, panel_dep)) + geom_smooth(aes(seq_dep, panel_dep)) + facet_wrap(~sample_type, scales = 'free') + labs(title = 'seq eff')
## `geom_smooth()` using method = 'loess'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 864.46
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 3.3025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 3.9167e+05
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 864.46
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 3.3025
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 3.9167e+05

ggplot(dd) + geom_point(aes(seq_dep, eff_seq, color = date)) + geom_smooth(aes(seq_dep, eff_seq)) + labs(title = 'seq_eff along with seq_dep') + facet_wrap(~sample_type, scales = c('free'))
## `geom_smooth()` using method = 'loess'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 864.46
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 3.3025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 3.9167e+05
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 864.46
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 3.3025
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 3.9167e+05

深度大则有效数据量多, 但利用率无明显线性相关性, 因为波动幅度较大 (且饱和极限深度为 10000X)

4.2.3 GC

ggplot(dd) + geom_point(aes(GC, eff_seq, color = date, size = panel_type, shape = sample_type), alpha = 0.6) + stat_smooth(aes(GC, eff_seq))  + labs(title = 'eff_seq ~ GC') 
## `geom_smooth()` using method = 'loess'

ggplot(dd) + geom_point(aes(GC, eff_seq, color = sample_type, size = panel_type), alpha = 0.6) + stat_smooth(aes(GC, eff_seq)) + labs(title = 'eff_seq ~ GC by sample') + facet_wrap(~sample_type, scales = "free_x") 
## `geom_smooth()` using method = 'loess'

ggplot(dd) + geom_point(aes(GC, eff_seq, color = sample_type, size = panel_type), alpha = 0.6) + stat_smooth(aes(GC, eff_seq)) + labs(title = 'eff_seq ~ GC by sample by date') + facet_wrap(~date, ncol = 3, scales = "free") 
## `geom_smooth()` using method = 'loess'

GC 含量影响数据利用率, 合理范围为预期 GC% +- 2% (如 NIPT)。

4.2.4 Q30

thre_q30 <- 0.8
ggplot(dd) + geom_boxplot(aes(sample_type, Q30), alpha = 0.3) + geom_violin(aes(sample_type, Q30), alpha = 0.6) + geom_jitter(aes(sample_type,  Q30), size = 0.2) + geom_hline(yintercept = 80, color = 'red') + labs(title = ' Q30 by date') + facet_wrap(~date, scales = c('free'))

Q30 阈值: 0.8

4.2.5 探针捕获特异性

ggplot(dd_sub) + geom_boxplot(aes(sample_type, specificity_cent), alpha = 0.3) + geom_violin(aes(sample_type, specificity_cent), alpha = 0.6) + geom_jitter(aes(sample_type,  specificity_cent)) + labs(title = 'specificity_cent by sample_type by date') + facet_wrap(~date, scales = 'free_y') + geom_hline(yintercept = 0.6, color = "red")

基本正常

4.2.6 接头污染比例

thre_adapter <- 0.2
ggplot(dd) + geom_boxplot(aes(sample_type, trim_adapter), alpha = 0.3) + geom_violin(aes(sample_type, trim_adapter), alpha = 0.6) + geom_jitter(aes(sample_type, trim_adapter)) + labs(title = 'trim_adapter by sample_type by date') + facet_wrap(~date, scales = 'free_y') + geom_hline(yintercept = thre_adapter, color = "red")

正常

4.3 信息分析环节

4.3.1 分析数据有效利用率

thre_eff_mut <- 0.2
ggplot(dd) + geom_boxplot(aes(sample_type, eff_mut), alpha = 0.3) + geom_violin(aes(sample_type, eff_mut), alpha = 0.6) + geom_jitter(aes(sample_type,  eff_mut), size = 0.2) + geom_hline(yintercept = thre_eff_mut, color = 'red') + labs(title = 'mut_dep/panel_dep by sample_type')

ggplot(dd) + geom_boxplot(aes(sample_type, eff_mut), alpha = 0.3) + geom_violin(aes(sample_type, eff_mut), alpha = 0.6) + geom_jitter(aes(sample_type,  eff_mut), size = 0.2) + geom_hline(yintercept = thre_eff_mut, color = 'red') + labs(title = 'mut_dep/panel_dep by sample_type by date') + facet_wrap(~date, ncol = 3, scales = 'free_y')

分析环节数据量利用率的稳定性: mean: 0.8516142, SD: 0.3797517; 其中CFDNA 的 mean: 0.6043703, SD: 0.4661001

4.3.2 dups

thre_dups <- 0.6
ggplot(dd) + geom_point(aes(samtools_dups, eff_mut, color = sample_type, size = eff_seq)) + geom_smooth(aes(samtools_dups, eff_mut)) + labs(title = 'seq_mut along with samtools_dups') + facet_wrap(~sample_type, scales = c('free'))
## `geom_smooth()` using method = 'loess'

ggplot(dd) + geom_boxplot(aes(sample_type, samtools_dups), alpha = 0.3) + geom_violin(aes(sample_type, samtools_dups), alpha = 0.6) + geom_jitter(aes(sample_type,  samtools_dups), size = 0.2) + geom_hline(yintercept = thre_dups, color = 'red') + labs(title = 'samtools_dups by date') + facet_wrap(~date, scales = c('free'))

dups 会影响分析利用率